Learning to coordinate in a complex and non-stationary world 
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We study analytically and by computer simulations a complex system of adaptive agents with 
finite memory. Borrowing the framework of the Minority Game and using the replica formalism we 
show the existence of an equilibrium phase transition as a function of the ratio between the memory 
A and the learning rates V of the agents. We show that, starting from a random configuration, a 
dynamic phase transition also exists, which prevents the system from reaching any Nash equilibria. 
Furthermore, in a non-stationary environment, we show by numerical simulations that agents with 
infinite memory play worst than others with less memory and that the dynamic transition naturally 
arises independently from the initial conditions. 



Social interactions pose many coordination problems 
to individuals. Generally social agents face problems of 
sharing and distributing limited resources in an optimal 
way. Examples range from the use of public roads and 
the Internet, to exchanging what we produce with what 
we consume. A solution to problem of this kind invokes 
the intervention of a public authority who finds the social 
optimum and imposes or suggests the optimal behavior 
to agents. While such a solution may be easy to find, its 
implementation may be difficult to enforce in practical 
situations. 

Self-enforcing solutions - where agents achieve optimal 
allocation of resources while pursuing their self-interests, 
without explicit communication or agreement with others 
- are of great practical importance. Competitive markets 
are the prototypical example of such a solution: With 
everybody maximizing his own profit and no one really 
caring for global optimality, competitive markets perform 
the remarkable task of leading to system wide optimality. 

Micro-economics and Game Theory have gone quite 
far in explaining what equilibria can one expect in social 
interactions. However most of these studies deal with un- 
realistic cases with either few players or with many, but 
identical, agents. Secondly the analysis is restricted to 
the equilibria which deductively rational players would 
agree upon. Such an approach seems unrealistic in cases 
involving many individuals with different goals and char- 
acteristics. The computational complexity required by 
deductive rationality may easily go far beyond the capa- 
bilities of agents. Inductive thinking, as suggested by 
Arthur H, may be a more suited model of how real 
people behave. A growing effort has indeed been put 
in recent years in understanding under what conditions 
bounded inductively rational agents may reach optimal 
outcomes Several learning rules have been found 

to lead to optimal outcomes when a single agent "plays" 
against nature 0]. Similar results hold for games with 
few players, even though non-trivial dynamical effects 
can also arise 0. 



In this letter we address the problem of how many 
heterogeneous adaptive agents learn to coordinate in a 
complex, eventually non-stationary, world. We draw in- 
spiration from recent work on the Minority Game J5j, in 
order to model a typical situation where a large num- 
ber of agents pursue different individual goals, using a 
certain number of distributed resources. Optimal use of 
resources becomes then a complex coordination problem. 

We focus on agents with finite memory and finite learn- 
ing rates. We find that, when agents need to "learn" 
collectively a fixed structure of interactions, they can at- 
tain a close to optimal coordination, provided that their 
memory extends far enough into the past. As the mem- 
ory decreases, the system undergoes a phase transition 
to a state where agents are unable to learn and play in a 
random way. 

More interestingly we find situations where the agents 
are unable to coordinate and to converge to a Nash equi- 
librium. Thus the game ends in a stationary regime with 
no cooperation. This is a completely dynamical effect 
which prevents the system from a proper convergence 
to equilibrium and makes useless the standard analysis 
based on Nash equilibria. This is a further clear evidence 
of the relevance of tools and ideas of statistical mechanics 
in the study of complex socio-economic systems, indeed 
dynamical transitions are very well known in statistical 
mechanics ||. 

The model we study is closely related to the Minority 
Game (MG). The reason for this choice is that this allows 
us to benefit from the detailed understanding which has 
been recently uncovered by the statistical mechanics ap- 
proach @,§. On one hand we can make reference to exact 
results, on the other we can extend our understanding of 
this keystone model of complex adaptive systems. 

The model is precisely defined as follows [^|J^] : Agents 
live in a world which can be in one of P states, labelled 
by an integer /i = 1, . . . , P. Each agent i = 1, . . . , N can 
choose between two personal strategies, labeled by a spin 
variable Si, which prescribe an action a't i for each state 
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jj,. These actions are drawn from a bimodal distribution 
for all i, s and fi, such that there are two possible actions, 
do something (a^ . j = 1) or do the opposite (a^. i = —1). 

The payoff received by an agent who plays strategy Sj, 
while her opponents take strategies s_; = {sj,Vj =/= i}, 
is, in the state fj,, 



(1) 



where A 11 — Y^j a s r The total payoff to agents is al- 
ways negative: The majority of agents receives a negative 
payoff whereas only the minority of them gain. 

The game is repeated many times; the state fj, is drawn 
from a uniform distribution p M = 1/P at each time and 
agents try to estimate, on the basis of past observations, 
which of their strategies is the best one. More precisely, 
if Si(t) is the strategy played by agent i at time t, we 
assume as in H that 



Prob[ Si (i) =s] <xexp[TU Sti (t)] , 



(2) 



where U St i(t) is the score of strategy s at time t and T is 
a positive constant (To) . Each agent monitors the scores 
U s .i(t) of each of her strategies s by 

U s>i (t + l) = {l-X/P)U Sii (t)+u^[s,s. i (t)]/P , (3) 

where the last term is the payoff agent i would have re- 
ceived if she had played strategy s at time t - see Eq. (|f|) 
- against the strategies s_j(i) = {sj(t), Vj ^ i} played 
by her opponents at that time. 

In words, Eqs. model agents who play more 

likely strategies which have performed better in the past. 
Eqs. (||j3|) belong to a class of learning models which has 
received much attention recently ||. 

The relevant parameter Jll|] is the ratio a = P/N be- 
tween the "information complexity" P and the number 
of agents, and the key quantity we shall look at is the 
global efficiency defined as a 2 = (A 2 ). 

This model differs from the MG H for two important 
aspects: First agents compute correctly the payoff for 
strategies s ^ Si(t) which they did not play. In the MG 
agents only account for the explicit dependence of wf on 
s which arises from i - see Eq. (Q) - whereas they ne- 
glect the fact that if they had taken a different decision 
also A^ would have changed. This seems reasonable at 
first sight because A^ is an aggregate quantity and its 
dependence on each individual agent is weak. A more 
careful analysis however shows that if agents prop- 
erly account for their impact on A 1 - 1 as in Eq. (^) a radi- 
cally different scenario arises: Rather than converging to 
an unique stationary state as in the MG, the dynamics 
(with A = 0) converges to one of exponentially many (in 
N) states - which are Nash equilibria jl2| - character- 
ized by an optimal coordination. This change emerges in 
the statistical mechanics approach with the breakdown 
of replica symmetry (RS): While the Minority Game is 



described by a replica symmetric theory, Nash equilibria 
are described by a full replica symmetry broken (RSB) 
phase [H . Our aim is precisely that of studying the co- 
ordination of adaptive agents in a complex world with 
exponentially many optimal states (Nash equilibria). 

The second key feature is that previous work has only 
explored the dynamics of learning with an infinite mem- 
ory - i.e. with A = in Eq. (||) - and for a fixed structure 
of interactions - i.e. with fixed (quenched) disorder a^. 
Our goal is to clarify the role of different time-scales in- 
volved in the learning dynamics. We shall first study the 
case where the structure of interactions is fixed - which 
corresponds to being the usual quenched disorder - 
and then move to the more realistic case where the struc- 
ture of interactions changes over long time-scales. 

Following the lines of reasoning of Refs. @Jl|, wo 
introduce a continuum time r = Tt/P and variables 
yi(r) = T[U +ti {t) - U-s(t)}/2 in terms of which the dy- 
namics reads 



dyi 



A 



^ Ji,j tanh(?/j 



(4) 



P N u a u , li 

, 1 \ - \ - a+,i ~ a-,i a+j + a-j 

|U=1 j=l 

~ P 2^ 2 2 
with r]i (t) a white noise with zero mean and correlations 

fe(r)^(r'))^^^^(r-r') . 

Refs. have shown that, for A = 0, the stationary 

states of this dynamics are related to the local minima of 
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where Hq is a constant and to, = (tanh(yi)). These states 
are also Nash equilibria fj^] , which means that agents 
achieve an optimal coordination. Since a 1 takes its min- 
ima for mi = ±1 - which correspond to yi — > ±oo - the 
stochastic force rji (t) is irrelevant in the late stages of the 
dynamics, which is dominated by the deterministic drift 
towards the Nash equilibrium. 

For A/r > we expect the stochastic force r?i(r), 
whose strength is itself proportional to a 2 , to compete 
with the deterministic drift. Indeed the distribution of 
yi will be cutoff for 3> T/A: For small A we expect 
that (tanh(j/j)) is close to the values rnf ^ which mini- 
mize a 2 , and a spread in the distribution of yi around 
its average which is maintained by the stochastic force. 
When A increases we expect a transition to a phase where 
agents are unable to coordinate because their memory is 
too short for learning correctly the interaction structure: 
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The dynamics is dominated by the stochastic force rji, 
which is made even stronger by the fact that a 2 /N ~ 1 
is much larger than in the coordinated state. This tran- 
sition is captured by the statistical mechanics approach 
of Ref. . Neglecting stochastic fluctuations induced by 
r]i, which is legitimate only for T -C 1, one can easily 
prove, following Ref. [0, that m\ = (tanhj/j) are given 
by the solution of the minimization of the function 

iJ = ct 2 + ^^ [log(l-m 4 2 ) + 2m 4 tanh" 1 (m l )] . (5) 

i 

In order to study the ground state properties of H we fol- 
low the same steps of Ref. : We introduce an inverse 
temperature /3, we compute the partition function and 
the free energy per agent and then we take averages over 
the disordered variables i with the replica method . 
The free energy, within the RS Ansatz, reads 



f(q,r,Q,R) = -hx 



1 



P{Q - q) 



2a + f3{Q-q) 
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where Q = -k J2i( m i) 2 anc ^ 9 = ( m i m i} with a ^ b la- 
belling different replicas of the systems; R and r arise 
as Lagrange multipliers and V z (to) = —^/armz + ^ (r — 
i?)m 2 + ^[log(l — TO 2 )+2TOtanh (to)]. The ground state 
properties of H are obtained solving the saddle point 
equations jl4|] in the limit j3 — *■ oo. 

In the inset of Fig. [I] we compare the analytical predic- 
tions for a 2 and Q with simulations results. We focus on 
small a (i.e. a = 0.1) where the effects we wish to discuss 
are more evident. Little discrepancies between numeri- 
cal data and analytical curves are maybe due to RSB ef- 
fects. Note that a phase transition occurs at A c ~ 0.461" 
where both a 2 and Q change their analytical behaviour. 
We have studied this equilibrium phase transition in the 
(A, 1/r) plane, confirming the critical line A c = 0.46L: 
Open symbols in Fig. |] refer to a static experiment where 
we let the system equilibrate to a Nash equilibrium for 
A = and then we move it slowly along lines Ar = const. 

The situation changes when the system starts from 
scratch [U s ,i(0) = V{s, i}] in each run. Depending on A 
and r, the dynamics may lead the system to a station- 
ary regime (different from the static one) which is char- 
acterized by larger fluctuations (i.e. larger a 2 ). These 
dynamical effects make the phase diagram more complex 
in the A < A c region (see Fig. |l|): In I the system always 
relaxes to the static equilibrium, in II it sometimes con- 
verges to equilibrium and sometimes get trapped in the 
metastable regime with large fluctuations, while in III it 
never reaches equilibrium. The presence of this dynami- 
cal transition implies that the analysis in terms of Nash 
equilibria is no longer enough to predict the collective be- 
havior of the system in a large part of the phase diagram, 
i.e. for high learning rates and short memory. 
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FIG. 1. Phase diagram: static (o) and dynamic (•) critical 
lines obtained from the simulation. The full line represents 
the RS critical line. The dashed lines are guide to the eyes. 
Inset: Q (•) and a 2 /N (o) as a function of X/T obtained for 
the simulation. The lines represent the RS solution. 



When the external world is non-stationary, i.e. changes 
with time, the adaptation task becomes still harder. We 
mimic the external world modification as follows: Ev- 
ery r time steps a randomly chosen state of the world is 
removed and a new one replaces it (in order to keep P 
constant). Actually we randomly choose a fi index and 
we re-extract the strategies ■ for all i and s. 



Here we focus on the results of the simulations done 
with r = 10 3 , r = co, NP = 10 4 and many A values. 
The results do not dependent on the initial conditions. 




FIG. 2. In a non-stationary world (r = 10 3 ) the evolution 
of a 2 /N with simulation time for 50 different samples and two 
values of A (NP = 10 4 , a = 0.1 and V = oo). 



In the upper panel of Fig. ^| we show the relaxation 
of a 2 /N for A = 2.5: As expected, it starts from 1 and 
converges to its equilibrium value. Note that r = 10 3 
has been chosen in order to allow the system to reach a 
cooperative behaviour before the world starts changing. 
For this value of A the system is robust with respect to 
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changes of the world: Apart from occasional excursions 
to states with large a 2 , agents are able to adapt them- 
selves to the evolving interaction structure. 

In the lower panel we present the evolution of a 2 /N 
for A = 3.5 (i.e. with shorter memory) in 50 different 
samples. The behaviour is now completely different: Af- 
ter having reached a low value of a 2 /N (cooperation) the 
system undergoes a sharp transition and a 2 /N jumps to 
a high value. The players are no longer able to adapt 
to the changing world and they start playing in a wrong 
way. Occasionally agents may achieve a good coordina- 
tion with small <r 2 , but they eventually always go back 
to uncoordinated states with large a 2 . 

For large times, the instantaneous values of a 2 /N have 
a roughly bimodal distribution: They are either low 
(~ 10~ 2 ) or high (~ 1). In Fig. || we plot the average 
of the low (o) and of the high (□) values (these averages 
can be defined in an unambiguous way thanks to the gap 
between low and high a 2 values). In the inset we report 
the fraction of samples that spend the last decade in the 
high a 2 regime. In a whole intermediate range around 
A c w 3.3 we find that coordinated states with small a 2 
coexist with wildly fluctuating states (a 2 > 1). 
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FIG. 3. Average low (o) and high (□) a /N as a func- 
tion of A (NP = 10 4 , a = 0.1, F = oo and r = 10 3 ). 
The arrow indicates a transition from the cooperative to the 
non-cooperative regime. The horizontal dotted line is the 
a 2 /N value with fixed world (r = oo). Inset: Probability of 
being in a non-cooperative regime as a function of A. 

Is worth noticing some facts in Fig. ||. The minimum 
of <t 2 , corresponding to the best cooperation, is no longer 
located in A = (i.e. infinite memory). In other words, 
in a non-stationary environment the agents play better 
with a finite memory, which allows them to take deci- 
sion based more on the recent past rather than on the 
far past. The minimum they can attain is very near to 
the cr 2 /N value in an unchanging world (shown with a 
horizontal line in Fig. |3|). The second remarkable fact 
is that the transition from a coordinated state to a high 
a 2 regime when A increases - which was continuous in 



a fixed world - shows features of first order transitions 
such as discontinuities and phase coexistence. 

In conclusion, we have extended the replica solution of 
the Minority Game to the case where agents have finite 
memory and finite learning rates. We have proven that a 
phase transition between phases with low and high a ex- 
ists as a function of A/T. We have also shown, by means 
of computer simulations, that a dynamical phase transi- 
tion exists for high values of A (short memories) , and that 
this dynamic phase transition is responsible for a non- 
cooperative behaviour of agents. Furthermore we have 
shown, by numerical simulation, that when the structure 
of the interactions is non-stationary, agents with infinite 
memory behave worst than agents with a finite memory. 
Under these conditions we recover again a scenario where 
agents with too short memory display a first order tran- 
sition from a cooperative to a non-cooperative phase. 
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